Prosodic, Spectral and Visual Features for the Discrimination of Prominent and Non-prominent Words

نویسنده

  • Martin Heckmann
چکیده

Despite its very high relevance for human communication current spoken dialog systems usually ignore the prosodic variations in the speech signal [1, 2, 3]. In [4] it was shown that speakers use prosodic cues to highlight corrections in a dialog with a machine and that these can be detected using prosodic cues. We extended this idea in [5] to the audio-visual discrimination of prominent from nonprominent words. Visual features have been shown to play an important role for human perception of prosody [6]. In this paper we have a closer look on the information contained in the different features from the acoustic and visual channel. In particular, we investigate the contribution of the visual features, i. e. nose movement and mouth appearance, and spectral features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inter-speaker variability in audio-visual classification of word prominence

In this paper we present results for the audio-visual discrimination of prominent from non-prominent words on a dataset with 16 speakers and more than 5000 utterances. We collected data in an experiment where users were interacting via speech in a small game, designed as a Wizard-of-Oz experiment, with a computer. Following misunderstandings of one single word of the system, users were instruct...

متن کامل

Word segmentation in Persian continuous speech using F0 contour

Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...

متن کامل

Differences between Speakers in Audio-visual Classification of Word Prominence

We show how the audio-visual discrimination performance of prominent from non-prominent words based on an SVM classifier varies from speaker to speaker. We collected data in an experiment where users were interacting via speech in a small game, designed as a Wizard-of-Oz experiment, with a computer. Following misunderstandings of one single word of the system, users were instructed to correct t...

متن کامل

How does Prosody Distinguish Wh-statement from Wh-question? A Case Study of Standard Chinese

There are wh-sentences which express speech acts of interrogative or declarative with the same syntactic structure in standard Chinese, such as, "bǎobao chīdiănr shénme?"(What does the baby intend to eat?) and "bǎobao chīdiănr shénme."(The baby intends something to eat.) The interrogative pronoun "shénme" (what)has different semantic functions, such as specific reference in the interrogative se...

متن کامل

Image Transformation based Features for the Visual Discrimination of Prominent and Non-ProminentWords

This paper investigates how visual information extracted from a speaker’s mouth region can be used to discriminate prominent from non-prominent words. The analysis relies on a database where users interacted in a small game with a computer in a Wizard of Oz experiment. Users were instructed to correct recognition errors of the system. This was expected to render the corrected word highly promin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014